Probabilistic record linkage
نویسندگان
چکیده
Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a 'black box' research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the concept of deterministic linkage and contrast this with probabilistic linkage. We illustrate each step of the process using a simple exemplar and describe the data structure required to perform a probabilistic linkage. We describe the process of calculating and interpreting matched weights and how to convert matched weights into posterior probabilities of a match using Bayes theorem. We conclude this article with a brief discussion of some of the computational demands of record linkage, how you might assess the quality of your linkage algorithm, and how epidemiologists can maximize the value of their record-linked research using robust record linkage methods.
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملDistance-based and probabilistic record linkage for re-identification of records with categorical variables
Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The ...
متن کاملProbabilistic Record Linkage for Genealogical Research
The most slow and tedious job in genealogical research is searching civil or church records for information about an individual. But, this is an essential step in research. By searching multiple sources such as census records, wills, deeds, birth and death records we can compile a more complete set of information, and potentially the pedigree of an individual. When records are stored electronic...
متن کاملValidating Distance-Based Record Linkage with Probabilistic Record Linkage
This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the num...
متن کاملA Survey of Probabilistic Record Matching Models, Techniques and Tools
Probabilistic record linkage regards the use of stochastic decision models to solve the problem of record linkage (also known as record matching). Data quality has became a key aspect in many institutions and the demand for novel, effective techniques is increasing. Record linkage in general has been studied in the last three decades and a solid probabilistic decision framework has been propose...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 45 شماره
صفحات -
تاریخ انتشار 2016